Species delineation and genetic structure of two Chaerephon species (C. pusillus and C. leucogaster) on Madagascar and the Comoro archipelago

Abstract Cryptic species diversity is known to be common in bats but remains challenging to study in these mammals, whose natural history traits render their sampling and monitoring challenging. For these animals, indirect genetic approaches provide a powerful tool to gain insight into the evolutionary history and ecology of cryptic bat species. The speciation history of the polyphyletic Chaerephon pumilus species group (Molossidae) is poorly understood, including those found on western Indian Ocean islands. Two species in this complex have been identified in the Comoros: C. pusillus and C. leucogaster. Here, we aim to genetically characterize these two species and investigate their spatial population genetic structure. Analyzing five nuclear microsatellite markers from 200 individuals and one mitochondrial DNA gene (Cyt‐b) from 161 (out of the 200) individuals sampled on Madagascar and the Comoros, our findings indicated that these species are genetically differentiated. We observed mitonuclear discordance in numerous individuals (33% of the 161 mtDNA‐sequenced individuals). Based on ABC analyses, we found that this pattern could potentially be the result of asymmetric introgressive hybridization from C. leucogaster to C. pusillus and calls for further studies on the demographic history of these species. Moreover, at the intra‐specific level, analyses of the microsatellite loci suggested the evidence of a more pronounced, although weak, geographically based genetic structure in C. pusillus than in C. leucogaster. Altogether, our findings provide preliminary insights into the eco‐evolutionary aspects of this species complex and warrant further research to understand hybridization dynamics and mechanisms responsible for mitonuclear discordance.


| INTRODUC TI ON
Due to morphologically cryptic taxa (i.e., species that are genetically distinct but morphologically difficult to differentiate), biodiversity measures are often underestimated (Bickford et al., 2007). The direct study of the ecology of a given species or species complex can nevertheless be challenging, for example, when they have the overall capacity for daily movements, disperse over long distances, or are difficult to capture and recapture. DNA sequencing approaches provide insights into this hidden diversity (e.g., Hebert, Cywinska, et al., 2003;Hebert, Ratnasingham, et al., 2003). Once such cryptic species have been identified, studying aspects of their ecology and evolutionary history is of prime importance and provides knowledge for their management and conservation. The study of nonrecombining mitochondrial DNA does not always provide the correct determination of a species' phylogenetic relationships but does provide insights into the origin of currently observed populations and their colonization history (e.g., Ndiaye et al., 2016;Zahiri et al., 2019).
On the other hand, studying fast-evolving genetic markers, such as microsatellites loci, is a powerful approach to inferring individual behavior (e.g., dispersal, sex-biased dispersal, Moore et al., 2020), as well as the recent demographic history of populations (e.g., introductions or bottlenecks, Biebach & Keller, 2009;Portanier et al., 2017), and the interactions between the environment and gene flow (e.g., population and landscape genetics, Coulon et al., 2004;Portanier et al., 2018). Finally, since genetic diversity is linked to the adaptive potential of populations (Frankham et al., 2004) and thus to their persistence (e.g., inbreeding depression, Keller & Waller, 2002), the genetic characterization of populations helps to assess aspects related to their conservation needs. Such studies also help to determine whether hybridization occurs between different species, which is crucial when addressing species delimitations, especially in sympatric and cryptic species (e.g., Filippi-Codaccioni et al., 2018).
In bats, much cryptic diversity is known to occur, which has been the focus of considerable research efforts in the last two decades (e.g., Ashrafi et al., 2013;Filippi-Codaccioni et al., 2018;Jones, 1997).
However, because of their nocturnal habits, generally small size and considerable mobility, bats are difficult to study and monitor in the wild. The genetic approach is thus an important method to provide a new understanding of these animals and has become extremely important in the field of conservation genetics (Dool, 2020;Tournayre et al., 2019). However, many bat genera and species complexes remain largely unstudied, especially for genetic markers (Dool, 2020).
The Comoro archipelago comprises four islands (Grande Comore or Ngazidja, Mohéli or Mwali, Anjouan or Ndzuwani, and Mayotte, Figure 1) midway between Madagascar and the African continent, and separated by approximately 300 km of sea from these two areas.
The Comoro islands were formed in situ by volcanic activity 15 million years ago (MYA; Emerick & Duncan, 1982, 1983: Mayotte is the oldest at 10-15 MYA, then Mohéli and Anjouan at 3.9-5 MYA, and Grande Comore is the youngest at 0.13-0.5 MYA (Emerick & Duncan, 1982;Nougier et al., 1986). The ring theory proposed by Naidoo et al. (2016) suggests that an African source population of the clade C. pumilus, for which genetic data suggest differentiated between 0.88 and 2.39 MYA, may have colonized the Comoro archipelago to form one arc of the ring and gave rise to C. pusillus species. The other arc of the ring probably colonized Madagascar and differentiated as C. leucogaster (among others, Goodman, Buccas, et al., 2010). The current sympatric distribution of C. leucogaster and C. pusillus (Goodman, Weyeneth, et al., 2010) in Mayotte raises the possibility of an expansion of C. leucogaster from Madagascar to the Comoros, thereby completing the ring.
Despite their complex evolutionary history, Chaerephon bats from the western Indian Ocean islands have only attracted research interest in the last two decades. Although the genus as currently defined is paraphyletic (Lamb et al., 2011), the C. pumilus group is monophyletic. The ecology and phylogeography for most species in this group remain poorly known, and species identification based on morphology is still provisional for specific taxa. Indeed, while cranial and dental characters have been used to distinguish C. leucogaster from C. pusillus (Goodman, Weyeneth, et al., 2010), the high degree of phenotypic variation observed in the genus (Jacobs et al., 2004;Simmons, 2005) makes species identification in some cases challenging, mainly where morphologically similar species occur in sympatry. This is the case in Mayotte, where both C. leucogaster and C.
pusillus are present (Goodman, Weyeneth, et al., 2010). The presence of a white patch colouration in C. leucogaster, as compared to dark brown in C. pusillus and occasionally with a narrow white band, as well as differences in tragus shape, may be used to differentiate the two species in the field (Goodman, Weyeneth, et al., 2010;

T A X O N O M Y C L A S S I F I C A T I O N
Population genetics Ratrimomanarivo et al., 2009). Some individuals show mixed pelage characteristics, including white spotting in the brown fur, mainly on the ventrum, sometimes on the neck, the back, or the flank. This raises the question of hybridization between these species, which has not been tested using nuclear markers. Hence, genetic studies might provide new insights into the delimitation of these two species and additional information into the evolutionary history and ecology of the C. pumilus species group members. In the present study, we thus investigated the genetic variability and population structure of C. leucogaster and C. pusillus from Madagascar and the Comoros, using the mitochondrial cytochrome b gene (Cyt-b, mt DNA) and five polymorphic nuclear microsatellite loci ( Nuc DNA) for 161 and 200 individuals, respectively. Our research aimed to genetically characterize the cryptic species C. leucogaster and C. pusillus occurring on these islands, and describe their intra-specific spatial population genetic structure. F I G U R E 1 Summary of results based on the mitochondrial and nuclear markers of the 200 individuals of Chaerephon species analyzed from the Comoro archipelago and Madagascar. The number of individuals with different mitochondrial and nuclear DNA combinations is indicated for each site. C.leuco.M: individuals tentatively identified as C. leucogaster based on morphology. Unk: individuals not morphologically identified. Geographic (in km) and genetic (F ST ) distances are indicated between and within species.

TA B L E 1
The number of individuals of Chaerephon species analyzed and their morphological identification for populations from Madagascar (former province names listed) and the Comoro archipelago (see Figure 1).

| Sampling
Two hundred individuals of Chaerephon were sampled on islands of the Comoro archipelago (Grande Comore, Anjouan, Mohéli, and Mayotte, the first three islands part of the Union of the Comoros and the latter a French department), and in three areas of Madagascar (ex-provinces of Antsiranana, Mahajanga, and Toliara; see Table 1 and Figure 1). Bats were captured using mist nets and harp traps.
Captured individuals were removed from the trapping devices and placed in cloth bags until they were examined. According to current (a photo taken with a digital camera for the 24 individuals that displayed a more or less extensive white spotting as described above, see Figure S1). The 200 sampled individuals included 116 females and 82 males. Six of these individuals captured in Mayotte were provisionally identified as C. leucogaster based on external morphology and referred to herein as "6 C.leuco.M" for "6 C. leucogaster from Mayotte" (Table 1). Finally, two individuals captured in Mayotte were not identified or sexed and were referred to as "2unk" ( Table 1).  Table S1 and analyzed with an automated sequencer. These five microsatellite markers were developed for a previous study on C. pumilus (Naidoo et al., 2016).

| Delimitation of species
To genetically verify the species identification, we first built a phylogenetic tree using the Cyt-b sequences generated in this study and additional sequences retrieved from public databases (6 sequences of C. pusillus, 12 of C. leucogaster, and two sequences of C. atsinanana as an outgroup, Genbank database, NCBI, see Figure 2). The Cyt-b sequences were aligned using the webPRANK server (http://www.ebi. ac.uk/goldm an-srv/webpr ank/) with default parameters (Löytynoja & Goldman, 2010). The phylogenetic tree was constructed using the Bayesian method implemented in MrBayes 3.2.7 software (Ronquist et al., 2012), with 10,000,000 generations. The automatic Smart Model Selection (SMS) in PhyML (Lefort et al., 2017) was employed to choose the adequate substitution model (HKY85 + G + I), based on Akaike Information Criterion (AIC, Burnham & Anderson, 2002).
Since it has been proposed that C. pumilus has common ancestors that gave rise to C. pusillus and C. leucogaster species (Naidoo et al., 2016), we further constructed a phylogenetic tree including Cyt-b sequences from C. pumilus to have an overview of the relationship between these regional species of Chaerephon. To decipher the F I G U R E 2 Bayesian phylogenetic tree of Chaerephon species based on cytochrome b sequences from 161 individuals sequenced herein and 75 sequences retrieved from Genbank. Node values correspond to posterior probabilities. Nodes supported by more than 0.9 are indicated. Chaerephon atsinanana is used as an outgroup. Individuals sequenced in this study are highlighted in blue and red for C. pusillus and C. leucogaster, respectively. The black-colored sequences are publicly available sequences from both species. In purple, 53 individuals morphologically identified as C. pusillus, assigned to C. pusillus clusters based on Nuc DNA but carrying C. leucogaster mt DNA, referred to as C. pusillus-C. leucogaster hybrids. From these 53 presumably hybrids bats, six (indicated with •) individuals (6 C.leuco.M) were morphologically identified as C. leucogaster, while the remaining were morphologically assigned to C. pusillus. The Malagasy ex-province locations are indicated as follows: Ant, Antsiranana; Tol, Toliara; and Mah, Mahajanga. The tree scale bar indicates the number of nucleotide substitutions per site.
Genetic clustering was also carried out using the five microsatellite markers mentioned earlier, developed by Naidoo et al. (2016).
The Bayesian clustering algorithm implemented by the software STRUCTURE v.2.3.4 (Pritchard et al., 2000;Pritchard & Wen, 2010) was used to test the likelihood of different clustering solutions (K: number of clusters varying from one to 10, with 20 repetitions for each value of K). We used the admixture model with correlated allele frequencies and a burn-in period of 300,000 steps followed by 1,000,000 MCMC repeats. The optimal number of genetic clusters was determined using both the log posterior probability of the data Ln Pr(X|K) and the Evanno ∆K method (Evanno et al., 2005) as implemented in the online software STRUCTURE HARVESTER v.0.6.94 (Earl & vonHoldt, 2012). Independent runs for the different K values were averaged using CLUMPP v.1.1.2 (Jakobsson & Rosenberg, 2007) as implemented in CLUMPAK (Kopelman et al., 2015), also implementing the DISTRUCT (Rosenberg, 2004) procedure to display graphical results. In CLUMPAK, the LargeKGreedy algorithm was used, with a random input order and 2000 repeats.

| Gene flow between species and historical demography
As potential incongruent patterns between mitonuclear markers could be the result of incomplete lineage sorting or hybridization, we tested whether gene flow between C. pusillus and C. leucogaster may explain the observed patterns (see Section 3) using an Approximate Bayesian Computation (ABC) framework (Beaumont et al., 2002). We ran the ABC analyses, with the ABCtoolbox package v.2.0 (Wegmann et al., 2010), and the programs SIMCOAL 2.0 and ARLSUMSTAT (Laval & Excoffier, 2004), on two different datasets. In the first dataset, the incongruent individuals were grouped with the C. leucogaster clade, whereas in the second set, they were included in the C. pusillus clade. All scenarios were run using the mtDNA (including C. pumilus sequences) and microsatellite loci. We   (Bonferroni, 1936). Finally, isolation by distance inter-specific patterns was assessed between geographic locations within species by regressing pairwise linearized F ST (F ST /(1-F ST )) and log-transformed geographic distances, using Mantel tests (9999 permutations, "ecodist" R package, Goslee & Urban, 2007), following Rousset (1997).
Genetic variability and population structuration were also determined from Cyt-b sequences. First, intra-specific diversity was evaluated by calculating nucleotide diversity (i.e., a set of genetic alleles usually transmitted together) per species with ARLEQUIN software.
Subsequently, the haplotype network, allowing the reconstruction of genealogic links between haplotypes of each species, was built using the NETWORK software v.5.0.1.1, with the Median Joining method (fluxu s-engin eering.com, Bandelt et al., 1999).

| Microsatellite characterization
The five microsatellite markers were successfully amplified for the 200 bat samples with a mean amplification rate of 99.4%. All markers were polymorphic, with the number of alleles ranging between three and 35 (see Table 2). MICROCHECKER indicated low null allele frequencies (<0.10) for all loci in both species (

| Delimitation of species
Based on Cyt-b polymorphism, phylogenetic reconstruction from the 179 Chaerephon (161 sampled individuals + 18 sequences from Genbank database) individuals suggested two highly supported clades (bootstrap 100%) corresponding to C. leucogaster and C. pusillus ( Figure 2). These two clades showed a genetic divergence of 8.0% at the inter-clade level but 2.0% and 1.3% within the C. pusillus and C. leucogaster clades, respectively. These divergence rates were similar when the introgressed individuals (see below) were excluded from the analysis, with an inter-specific divergence of 7.42% and an intra-specific divergence of 2.00% and 0.67% within C. pusillus and C. leucogaster, respectively. By including 59 sequences of C. pumilus from Genbank in the phylogenetic tree, we found the same distinct clades of C. pusillus and C. leucogaster ( Figure S2).
Although our nuclear markers were limited to five polymorphic loci, the microsatellite data also supported this strong species distinction. STRUCTURE clustering procedure indicated an optimal number of clusters of K = 3 (maximum likelihood) or K = 2 (Evanno method, Figure S3A). When considering K = 2, the two clusters obtained coincide with the two species as morphologically identified F ST values were always high and significantly different from zero (exact G-tests, p < .001, and IC95% not including zero, Given that private alleles are specific and unique to a population or species, they can be highly informative and allow the precise delimitation of species. We searched for private alleles based on our clustering results and identified two potential private alleles at locus TabrH6 in C. leucogaster. While both markers discriminated between the two clusters, they nevertheless also displayed some incongruences between morphological and genetic species assignment. Among the 161 individuals sequenced for the Cyt-b gene, 62 individuals were assigned to C. pusillus and 46 to C. leucogaster based on molecular ( mt DNA and Nuc DNA) and morphological data ( Figure 1). Interestingly, 53 of the remaining individuals (representing 33%) were differently assigned based on their mitochondrial and nuclear DNA: they carried mt DNA haplotypes of C. leucogaster while bearing C. pusillus Nuc DNA (stars, were morphologically identified as C. pusillus (Figure 2), and six were assigned to C. leucogaster (6 C.leuco.M, Figure 2).

TA B L E 3 Genetic differentiation between
Chaerephon leucogaster and C. pusillus (based on both STRUCTURE assignment and morphological identification) with or without the 6 C.leuco.M and 2unk individuals (see main text) and considering these individuals in different species.

| Gene flow between species and historical demography
Regardless of the dataset used for the ABC analysis, the posterior probabilities significantly selected model 3, which assumes recent gene flow between the species since their divergence (pp = 0.51-0.59; Figure 4). The two other models (Figure 4, models 1 and 2) had low posterior probabilities (pp = 0.14-0.15 and pp = 0.26-0.34, respectively) and were rejected. The goodness of fit between the selected model and the observed data was validated by the PCA analysis in which the simulated data fit well with the observed data ( Figure S4). The ABC analyses showed a good power of discrimination between the three models ( Figure

| Genetic species diversity
We further investigated the genetic diversity and structure at the intra-specific level for each species, assigning individuals to one species based on their assignment following the STRUCTURE analysis (i.e., the 6 C.Leuco.M were considered as C. pusillus and the 53 C. pusillus carrying C. leucogaster mt DNA were considered as C. pusillus). We found that the allelic richness of microsatellite loci tended to be lower in C. pusillus than in C. leucogaster (Table 2,  Table 2) heterozygosities were considerably lower in C. pusillus than in C. leucogaster. Interestingly, this pattern was also observed with the Cyt-b sequences for which F I G U R E 4 Graphical representation of the ABC models testing the demographic histories of Chaerephon leucogaster and C. pusillus divergence. TdP and TdL refer to the time of divergence events (in generation numbers, assuming a generation time of 5 years); Dbl and Dbp to the duration of bottleneck events; Tmr and Tma to the time of recent and ancient gene flow, respectively. Details on the parameter priors are given in Table S2.

F I G U R E 5
Haplotype network for the 161 individuals of Chaerephon species. Haplotype distribution of the two major clades corresponding to C. leucogaster and C. pusillus on Madagascar and in the Comoro archipelago. Letters correspond to the haplotypes, and numbers are the total individuals belonging to each haplotype.
nine haplotypes were identified in the C. leucogaster phylogenetic clade. At the same time, only five were detected in the C. pusillus clade (i.e., excluding the 47 individuals morphologically identified as C. pusillus, which carried C. leucogaster mitochondrial haplotypes, Figure 5). Nucleotide diversity in C. leucogaster populations was almost twice as high as in C. pusillus (π = 0.0022 and 0.0013, respectively).
Within-species population structure was assessed through clustering analyses and F ST computations. The optimal clustering solution assessed by a maximum likelihood was K = 1 for both C.

| DISCUSS ION
Altogether, the genetic analyses performed in the present study indicated that C. pusillus and C. leucogaster from the Comoro archipelago and Madagascar are genetically distinct. Indeed, the mitochondrial phylogenetic reconstruction shows the existence of two strongly supported clades, mostly corresponding to individuals morphologically identified as C. leucogaster or C. pusillus, except 53 C. pusillus individuals showing evidence of introgressed mt DNA of C. leucogaster. This was further supported by a higher inter-species divergence (7.4%) compared with the intra-specific divergence (2.0% and 0.7% for C. leucogaster and C. pusillus, respectively). Likewise, this genetic distinction was also highlighted by microsatellite-based population genetics analyses, which indicated the presence of two well-supported genetic clusters corresponding to morphological identifications (except for the 6 C.leuco.M individuals). Accordingly, a solid genetic differentiation (F ST = 0.25) was detected between the species, and, although this needs to be confirmed with a larger number of sample sizes and polymorphic markers, two alleles seemed  Figure S12). Furthermore, the six individuals identified in the field in Mayotte as C. leucogaster carried the mitochondrial C. leucogaster clade haplotypes but grouped with C.
pusillus genetic cluster based on microsatellites. While species are traditionally described using morphological characters, these results highlight how genetic tools bring complementary information for species identification in this cryptic group.

| Haplotype distribution of Cyt-b
Genetics is also a helpful tool to unravel the colonization history of species. Regarding the complex C. pumilus, Naidoo et al. (2016) proposed the "ring hypothesis," suggesting that C. leucogaster and

| Mitonuclear discordance
Our present dataset suggests that 53 individuals out of 161 (33%) carried the Cyt-b haplotype of C. leucogaster while bearing the Nuc DNA from C. pusillus. Interestingly, nearly half of the C. pusillus individuals sequenced for mt DNA bear the Cyt-b haplotype of C. leucogaster, suggesting potential incomplete lineage sorting or asymmetric hybridization between species. While it is difficult to distinguish between the processes behind this pattern (e.g., Holder et al., 2001;Holland et al., 2008;Joly et al., 2009) While hybridization is a broad phenomenon in bats due to their social life and mating system (e.g., Bogdanowicz et al., 2012;Mao et al., 2010), asymmetric hybridization seems to be less common with only a few reports, for example, between Myotis myotis and M. blythii (Berthier et al., 2006), or Pteronotus rubiginosus and P. alitonus (Filippi-Codaccioni et al., 2018). Different processes may explain such a phenomenon, including disassortative mating, demographic dynamics of local and colonizing species, differential selection of mitochondrial variants, or a combination of factors (Toews & Brelsford, 2012). Deciphering the underlying mechanisms between C. leucogaster and C. pusillus requires further research, including behavior (dispersal bias, mating), population demography (difference in population size), and physiological features (genetic incompatibilities), especially within the zone of sympatry (Mayotte). Furthermore, a large-scale analysis using markers from multiple genetic systems (e.g., mitochondrial, nuclear autosomes, and sex chromosomes), as well as environmental data, should provide the needed insights to disentangle the different ecological and evolutionary mechanisms that contribute to producing asymmetric introgression between these two species.

| Within-species genetic variability and population genetic structure
Within species, comparisons of genetic structure through clustering approaches suggested that individuals distributed across the sampled geographic areas of each species formed panmictic populations. However, it is noteworthy that although low, the F ST values differed significantly from zero at the intra-population level for both species (Figure 1, Tables S5

| CON CLUS ION
Our study brings new insights into the ecology and phylogeography of C. leucogaster and C. pusillus living on islands in the western Indian Ocean. The evolutionary history of these species from Madagascar and Comoros is complex. To advance even further on the aspects addressed in this paper, high-throughput genome analyses will help uncover if contemporary hybridization occurs on the different islands (other than Mayotte), identify the classes of hybrids, and test alternative hypotheses that may explain the observed mitonuclear discordances. Indeed, for now, it is not possible to disentangle whether these discordances are the result of asymmetric hybridization or incomplete lineage sorting. Further studies using the approach of demographic inferences would allow discrimination between these processes and new insights into the population structure of C. leucogaster and C. pusillus. One example might include effective population size estimations, which might differ between species, as suggested by the observed differences in genetic diversity. Indeed, mitochondrial sequences and microsatellite loci consistently showed lower levels of genetic variation in C. pusillus than in C. leucogaster. Populations of C. pusillus inhabit small islands (from 290 km 2 for Mohéli to 1659 km 2 for Grande Comore) as compared to C. leucogaster occurring on Madagascar (587,041 km 2 ), and smaller population sizes might thus be expected for C. pusillus, being more susceptible to genetic drift. Finally, our results indicate that some caution is needed for cryptic species, such as the two studied herein, regarding specific identification based only on pelage colouration and morphometric characters, further confounded by different measurement techniques employed by field workers, but these aspects are out of the scope of the current study.

CO N FLI C T O F I NTE R E S T
The authors declare no competing interest in this work.

DATA AVA I L A B I L I T Y S TAT E M E N T
The generated Cyt-b sequences have been deposited in the GenBank database (accession numbers Banklt2637337: OP763182-OP763342). Microsatellites' genotypes will be made available on the Dryad repository.